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We provide a quantitative analysis of the phenomenon of crowding of near-extreme events by computing 
exactly the density of states (DOS) near the maximum of a set of independent and identically distributed random 
variables. We show that the mean DOS converges to three different limiting forms depending on whether the 
tail of the distribution of the random variables decays slower than, faster than, or as a pure exponential function. 
We argue that some of these results would remain valid even for certain correlated cases and verify it for power- 
law correlated stationary Gaussian sequences. Satisfactory agreement is found between the near-maximum 
crowding in the summer temperature reconstruction data of western Siberia and the theoretical prediction. 
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Extreme value statistics (EVS) [1], — the statistics of 
the maximum or the minimum value of a set of random 
observations, — has seen a recent resurgence of interests due 
to its applications found in diverse fields such as physics |2], 
engineering yt], computer science 01, finance fl], hydrol- 
ogy 10] , and atmospheric sciences [7]. In particular, for inde- 
pendent and identically distributed (i.i.d.) observations from 
a common probability density function (PDF) p(X), the EVS 
is governed by one of the three well known limit laws ll|], 
namely, (a) Frechet, (b) Gumbel, or (c) Weibull, depending 
on whether the tail of p(X) is, (a) power-law, (b) faster than 
any power-law but unbounded, or (c) bounded, respectively. 
Recently, these same limiting laws have also been observed in 
a seemingly different problem concerning the level density of 
a Bose gas and integer partition problem ISO. 

While EVS is very important, an equally important issue 
concerns the near-extreme events [9], — i.e., how many events 
occur with their values near the extreme*} In other words, 
whether the global maximum (or minimum) value is very far 
from others (is it lonely at the top?), or there are many other 
events whose values are close to the maximum value. This 
issue of the crowding of near-extreme events arises in many 
problems. For instance, in disordered systems, the low tem- 
perature properties are governed by the spectral density func- 
tion of the excited states near the ground state. In the study 
of weather and climate extremes, an important question is: 
how often do extreme temperature events such as heat waves 
and cold waves occur? While for an insurance company, it 
is very important to safeguard itself against excessively large 
claims, it is equally or may be more important to guard it- 
self from unexpectedly high number of them. In many of the 
optimization problems finding the exact optimal solution is 
extremely hard and only practical solutions available are the 
near-optimal ones [10]. In these situations, the prior knowl- 
edge about the crowding of the solutions near the optimal one 
is very much desirable. 

In this Letter, we study quantitatively the phenomenon of 
the crowding of events near the extreme value for i.i.d. ran- 
dom variables, and find rather rich and often universal behav- 
ior. In general, the events that occur in nature are correlated. 



However, when the correlations among them are not very 
strong, then their EVS converges to that of the i.i.d. random 
variables [11]. This is why the limiting laws of EVS of the 
i.i.d. random variables are very useful. Here we consider i.i.d. 
random variables in the similar spirit of the random-energy 
model [12] for disordered systems, — which despite its sim- 
plicity that the energy levels are i.i.d. random variables, has 
been successful in capturing many qualitative features of com- 
plex spin-glass systems. Moreover, we provide an example 
of a power-law correlated case, where the behavior of near- 
extreme events converges to that of the i.i.d. random vari- 
ables. In addition, by comparing the near-maximum crowd- 
ing in the reconstructed summer temperature data of western 
Siberia against the prediction from the i.i.d. random variables, 
we find satisfactory agreement. 

We start with a sequence of N i.i.d. random observa- 
tions {Xi, X2, • • • Xn}, drawn from a common PDF p(X). 
Let X max be the maximum of the sequence, — i.e., X max = 
max(Xi , X2, • • • Xjsf). A natural measure of the crowding of 
events near X max , is the density of states (DOS) with respect 
to the maximum 
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where r is measured from the maximum value, and we do 
not count X max itself, — i.e., J °° p(r, N) dr — 1 — 1/N. 
Clearly, p(r, N) fluctuates from one realization of the ran- 
dom sequence to another, and one is interested in knowing 
whether its statistical properties show any general limiting be- 
havior, in the same sense, as one finds for the EVS. Note that, 
even though the random variables are independent, the differ- 
ent terms in Eq. (OQ) become correlated through their common 
maximum X max . 

We find that the mean DOS p(r, N) displays rather rich lim- 
iting behavior, as TV — > 00. If the tail of the parent distribution 
p(X) of the random variables decays slower than a pure ex- 
ponential function, the behavior of p(r, N) is governed by the 
corresponding extreme value distribution. On the other hand, 
when the tail of p{X) is faster than a pure exponential, it is 
related to the parent distribution itself. In the borderline case 
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when p(X) has a pure exponential tail, p(r, TV) is entirely dif- 
ferent. 

To find p(r, TV), first consider Eq. (Q]) for a given value of 
the maximum at X max = x. Then the rest of the (TV — 1) vari- 
ables are distributed independently according to the common 
conditional PDF p CO nd(^, ^) = p(X)/ J^ OQ p(y)dy. Hence 
the conditional mean DOS, from Eq. (Q]), is p C ond(V, N, x) — 
[(TV — l)/N]p con d(x — r,x). For a set of TV i.i.d. random 
variables, the PDF of their maximum value X max = x is 



Pmax(x,N) = Np(x) 



p(y) dy 
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Thus, p(r, A 7 ") = p C ond(r, TV, x) p ma xO, TV) dx. 

Upon substituting the expressions for p C ond(V, TV, x) and 
p max (x, TV), a little algebra shows that 
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This is the key result, which is valid for all TV. We next ana- 
lyze its limiting behavior for large TV. 

For i.i.d. random variables, it is known thatp max (x) has a 
limiting distribution Q_X|] : 
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The non-universal scale factors a at and b n depend explic- 
itly on the parent distribution p(X) and TV. However, the 
scaling function f(z) is universal and belongs to (a) Frechet, 
(b) Gumbel, or (c) Weibull, depending only on the tail of 
p(X). For example, if p{X) ~ exp(— X 6 ) for large X, 
thenar ~ (lnTV) 1 / 5 and b N ~ tf" 1 (In TV) 1/5-1 for large 
TV, and the scaling function is the universal Gumbel PDF 
f(z) = exp [— z — exp(— z)]. Note that, as TV — > oo, for 
5 < 1, bjsf — > oo, whereas b^ — > for S > 1. In fact, 
this large N behavior of b at is not restricted to only this spe- 
cific tail of p(X), but is more generic: for any slower than 
exp(— X) tail of p(X), as N increases bN also increases, 
whereas for any faster than exp(— X) tail, b^ decreases as N 
increases. This is indeed responsible for the generic limiting 
behavior of p(r, N). 

When p(X) has a slower than exponential tail, so that 
b 7v — > oo as TV — > oo, it is useful to make a change of vari- 
able x = cin -\- bjsfZ in Eq. ([5]). Then one immediately re- 
alizes that p(b^z + a n — r) is highly localized, in the limit 
N — > oo, compared to /(z), — i.e., b^pib^z + — r) — > 
5(2; — [r — a at] /& at). Therefore, in the scaling region of order 
b at, around r = a at 
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On the other hand, if the tail of p(X) is faster than expo- 
nential, so that b at — > as N — > 00, the PDF of the maximum 
becomes highly localized near x = a at, — i.e., p max (x^ N) — > 
— a at). Therefore, Eq. © yields 



p(r, TV) A/ "^°°) p(a N - r). 
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In EVS, the convergence towards the limiting distribution 
is usually very slow [13]. Therefore, it is instructive to check 
how p(r, TV) approaches the limiting form for large TV. For 
this purpose, now we consider explicit forms of p(X), such 
that p(r, TV) can be computed to high accuracy for any given 
TV by numerically integrating Eq. ([3]), and also the explicit 
forms for a at and 6 at as a function of TV can be obtained. The 
mean number of events close to the maximum, for a finite but 
large sample of size TV, is proportional to p(0, TV). In certain 
cases, r = is part of the scaling f unction a nd p(0, TV) can be 
obtained from the scaling form of p(r,N) by putting r = 0. 
However, sometimes r = is not part of the scaling regime 
and p(0, TV) has to be computed separately from Eq. ([3]). For 
simplicity, we consider only positive random variables. 

A. Power-law tail— Consider p(X) = aexp ^+ X a ~ a \ 
where a > 0. In this case, a at = and b^ = TV 1//a . There- 
fore, limiting p(r, TV) is given by Eq. ©, with f(z) belonging 
to the Frechet class: 
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Figure. [TJ\ compares this limiting form with the results ob- 
tained from Eq. © by evaluating the integration numerically. 
Here, r = is away from the scaling regime. Thus, p(0, TV) 
is obtained directly from Eq. ©, 
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B. Faster than power-law, but unbounded tail. — Con- 
sider p(X) = dX 6-1 exp(— X 6 ), where S > 0. In this case 
a at = (lnTV) 1 ^ and^AT = ^(lnTV) 1 ^-!. For very large 
and very small r, the large TV forms of the mean DOS have 
same forms for all S, — i.e., p(r, TV) ~ Np(r) for r ^> a at, 
and p(r, TV) « p(cin — r) for r < ajy. Thus, at r = 



p(0,N) 



p(a A r) = -(lnTV) 1 - 1 / 5 , 



TV 
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for all 5. However, the scaling behaviors of p(r, TV) are very 
different for the three cases: S < 1, S = 1, and S > 1. 

Case I: S < 1. As TV — > 00, 6at — > 00. Therefore, in the 
scaling regime around r = aAr, — which, however, becomes 
larger as TV increases, as b at becomes larger — the limiting 
p(r, TV) is again given by Eq. ©, but now f(z) belongs to the 
Gumbel class: 



f(z) = f2(z) = exp [-z - exp(-z)] . 



(10) 



Figure. QJ* (a) compares the limiting form with the results ob- 
tained from Eq. ^ by numerical integration. 

Case II: 5 = 1. In this case 6 at = 1. In this borderline case 
neither of the limiting forms, — i.e., Eq. ([5]) or ©, are reached 
in the large TV limit. Instead, we find a completely different 
behavior: p(r, TV) = g(r — a at), where the scaling function 
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A: p(r, TV) for TV = 10 2 (blue), 10 3 (red) and 10 4 (green), for the power-low distribution p(X) = 
with a = 2. The dashed (black) line plots the Frechet distribution /i(r/6jv). B: p(r, TV) for exponential decay 

10 3 (blue), 10 5 (red) and 10 7 (green). The dashed (black) line plots the Gumbel 



FIG. 1: (Color online). 

aexp(-X- a )X- (1+a) , 

p(X) = 5X 6 - 1 exp(-X d ). (a) For 5 = 1/2, with N 
distribution f2([r — ajv]/6jv). (b) For 5 = 2, with TV 



10 3 (blue), 10 6 (red) and 10 9 (green). The dashed (black) line plots p(a N 



C: 



p(r, N) for bounded distribution, p(X) = 0a~ p {a - Xf' 1 for X < a and p(X) = for X > a, where a = 10. (a) For = 3/2, with 
N = 10 2 (blue), 10 3 (red) and 10 4 (green), (b) For = 1/2, with N = 10 (blue), 10 2 (red) and 10 3 (green). The dashed (black) lines plot 

p(a — r). 



Case III: 5 > 1. As TV -> oo, b N -> 0. Thus, p(r,TV) 
now converges to the other form given by Eq. ©, which is 
compared in Fig.QJ* (b), with the results obtained from Eq. © 
by evaluating the integration numerically. 

C. Bounded tail— Consider p(X) = 0a~^(a - Xf' 1 
for < X < a, where > 0, and p(X) = otherwise. 
In this c ase, a at = a and bN = aN~ x ^ . Therefore, again 
p(r, TV) now converges to the other form given by Eq. ©. 
The comparison with Eq. © is illustrated in Fig. Again, 
N dependence of pffi^N) for large TV, does not follow from 
the limiting p(r, A 7 '). This is obtained directly from Eq. ©, 

To summarize the explicit results: When the tail of p(X) 
is either power-law or bounded, the convergence of p(r, TV) 
to the respective limits given by Eqs. © and © are fast, 
as can be seen from Figs. UK and [It respectively. How- 
ever, in the intermediate situation — i.e., when p(X) decays 
faster than power-law but not bounded, — the convergence is 
slow, as can be seen from Figs. QJ* (a) andQJ* (b). In other 
words, the more p(X) deviates from exp(— X) in either di- 
rection (slower and faster), p(r, N) converges more quickly 
(with increasing N) to its limiting form. As N increases, the 
mean number of events close to the maximum, which is pro- 
portional to p(0, N), decreases faster forp(X) with a broader 
tail [cf. Eqs. ®, © and ([T2l)l. This is also evident from the 
small r behavior of p(r, N) in the scaling regime, — i.e., from 
the peak to the le ft in Fig s. [QA and[l£ (a): For p(X) with 
a power-law tail, p(r, N) has an essential singular behavior 
exp(— N/r a ) for small r [cf. Eq. ©], and for a stretched- 
exponential tail (B with S < 1), as r decreases from aN 
in the scaling regime p(r, N) decreases super-exponentially 
exp(— expQajv — ^]/&iv)) [cf. Eq. (TTObl . On the contrary, for 
p(X) having faster than exp(— X) tail, there is crowding near 
the maximum value (r = 0) [Figs. QJ* (b) and[T]C]. 



Another measure of the loneliness of the maximum is the 
gap between the maximum and the next highest value. Let 
Q(e\N) be the PDF of the gap being e. Clearly 

/oo 
p(z + e)p max (z,N -l)dz. (13) 
-OO 

In particular, when p(X) ~ exp(— X 5 ) for large X, we find 
the limiting form 

Q{e \ N ) Zz!^ J_ exp(-e/6 iv ). (14) 
on 

Thus, the typical gap is of the order bN, which increases (de- 
creases) as N increases for S < 1 (S > 1), — consistent with 
the results obtained form the study of mean DOS. 

So far, we have considered the case of i.i.d. random vari- 
ables. What would happen if the random variables are cor- 
related? For short-ranged correlation, one expects the results 
from i.i.d. random variables to hold. However, for a station- 
ary Gaussian sequence (SGS), this holds even for long-range 
(e.g. power-law) correlation. More precisely, for SGS a rig- 
orous theorem [11] states: if the correlator C(n) = X^Xi +n 
satisfies either lim^oo C(n) In n = or J2^Li C 2 (n) < oo, 
then the limiting distribution of the maximum [cf. Eq. ©] is 
Gumbel [cf. Eq. (TTObl . and a at and bN are same as those in 
the case of independent Gaussian random variables. Based on 
this theorem, one therefore predicts that p(r, N) for large TV, 
should be independent of the correlation function C(n) and 
hence would be the same as that of Gaussian i.i.d. random 
variables. We have indeed verified this prediction for SGS's 
with a power-law correlation C (n) = ( 1 + n 2 ) ~ 7 / 2 , which are 
generated using numerical simulation. We compute p(r, TV) 
from these sequences for three different values of N and for 
each N two different values of 7, and compare with the one 
obtained by numerically integrating Eq. © for same N and 
using p(X) = exp(— X 2 /2)/\ / %r, — this is shown in Fig. [2 
While for smaller N [cf. Fig. [2] (a)] they differ, for larger N 
[cf. Fig. 12(c)] the difference becomes unnoticeable. 
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(a)iV = 


1025 


(b)N = 


16385 


(c)N = 


262145 




FIG. 2: (Color online). p(r, N) for stationary Gaussian random se- 
quence with a correlator C(n) = (1 + n 2 ) _7//2 , where 7 = 0.5 
(blue) and 7 = 1 (red) obtained from numerical simulation, and for 
Gaussian i.i.d. random variables (black dashed) obtained by numer- 
ical integration of Eq. The three sets of curves (a), (b) and (c) 
correspond three different values of N. 



How well do the mathematical results describe real data ? 
That is what we check last in this Letter, by comparing against 
the reconstructed Yamal multimillennial summer temperature 
data by Hantemirov and Shiyatov lil4f1 . The reconstructed 
data- set consists of yearly mean summer temperature anoma- 
lies (AT), of Yamal Peninsula of western Siberia, relative to 
the mean of the full reconstructed series for 4000 years (2000 
BC to AD 1996), which is shown in Fig.[3](a). We divide the 
full time series into blocks of N years, and for each block: 
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FIG. 3: (a) Yamal peninsula June- July mean temperature anomaly 
(AT) reconstruction series Il4f1 . (b) The histogram plots the dis- 
tribution AT of the data shown in (a). The solid line represents 
p(AT) = exp(-AT 2 /2)/V27r. I n (c) and (d), the histograms plot 
the mean DOS relative to the maximum (excluding the maximum), 
computed by dividing the data into blocks, with each block consists 
of N years. Solid lines are calculated using the exact numerical in- 
tegration in Eq. The dashed lines represent p(cln — r), where 
a N = (2 In N) 1 / 2 - (21niV)- 1/2 (lnlniV + ln4?r)/2. 



(I) find the maximum value of AT, and then (II) with respect 
to this maximum, compute p(r, N) using Eq. (Q]). Finally, we 
find p(r, TV), by taking average over all the blocks. The his- 
tograms in Fig. 0(c) and (d) illustrate p(r, TV), computed by 
dividing the full series into 40-blocks with 100 years of data in 
each block, and 4-blocks with 1000 years of data in each block 
respectively. Now to compare with our results, we first com- 
pute the distribution of AT from the full time series, which 
is illustrated in Fig. [3] (b) by histogram, along with the solid 
line given by the Gaussian distribution. In Fig.[3](c) and (d), 
the solid lines are computed using the Gaussian distribution 
from Eq. ©, by performing exact numerical integration, with 
N = 100 and N = 1000 respectively. The dashed lines cor- 
respond to the limiting form p(a^ — r), obtained in Eq. ^ 
for large N. The agreements between them (dashed and solid 
lines) are satisfactory. 
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